Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.
translated by 谷歌翻译
从神经网络统治了图像处理的那一刻,解决目标任务所需的计算复杂性飙升:根据这种不可持续的趋势,已经制定了许多策略,雄心勃勃地针对绩效的保存。例如,促进稀疏拓扑允许在嵌入式,资源约束的设备上部署深神网络模型。最近,引入了胶囊网络以增强模型的解释性,其中每个胶囊都是对象或其零件的明确表示。这些模型在玩具数据集上显示出令人鼓舞的结果,但是它们的低可伸缩性可阻止在更复杂的任务上部署。在这项工作中,我们通过减少胶囊数量来探索稀疏性以提高其计算效率。我们展示了胶囊网络的修剪如何通过更少的内存需求,计算工作以及推理和训练时间来实现高概括。
translated by 谷歌翻译
深度学习优化的最新进展表明,借助有关训练有素的模型的一些A-posteriori信息,可以通过简单地训练其参数的一部分来匹配相同的性能。这种发现从理论到应用都有广泛的影响,将研究推向方法,以识别无需查看信息开发而训练的最小参数子集。但是,提出的方法与最新性能不符,并依赖于非结构化的稀疏连接模型。在这项工作中,我们将重点从单个参数转移到整个神经元的行为,从而利用了神经元平衡的概念(NEQ)。当神经元处于平衡状态(意味着它已经学会了特定的输入关系)时,我们可以停止其更新;相反,当神经元处于非平衡状态时,我们使其状态朝着平衡状态进化,从而更新其参数。提出的方法已在不同的最新学习策略和任务上进行了测试,验证了NEQ并观察到神经元平衡取决于特定的学习设置。
translated by 谷歌翻译
如今,深入学习模型已广泛部署,以解决各种各样的任务。但是,很少关注关联的法律方面。 2016年,欧盟批准了2018年生效的一般数据保护法规。其主要理由是通过经营所谓的“数据经济”的方式来保护其公民的隐私和数据保护。由于数据是现代人工智能的燃料,因此认为GDPR可以部分适用于一系列算法的决策制定任务,然后更具结构化的AI法规生效。同时,AI不应允许不希望的信息泄漏与创建的目的偏离。在这项工作中,我们提出了DISP,这是一种深入学习模型的方法,该方法删除了与AI处理的数据相关的某些私人类别相关的信息。特别是,分配是一种正规化策略,在培训时间删除了属于同一私人班级的功能,从而隐藏了私人课程会员资格的信息。我们对最先进的深度学习模型的实验显示了分配的有效性,最大程度地降低了我们希望保持私人的班级的提取风险。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
The open-radio access network (O-RAN) embraces cloudification and network function virtualization for base-band function processing by dis-aggregated radio units (RUs), distributed units (DUs), and centralized units (CUs). These enable the cloud-RAN vision in full, where multiple mobile network operators (MNOs) can install their proprietary or open RUs, but lease on-demand computational resources for DU-CU functions from commonly available open-clouds via open x-haul interfaces. In this paper, we propose and compare the performances of min-max fairness and Vickrey-Clarke-Groves (VCG) auction-based x-haul and DU-CU resource allocation mechanisms to create a multi-tenant O-RAN ecosystem that is sustainable for small, medium, and large MNOs. The min-max fair approach minimizes the maximum OPEX of RUs through cost-sharing proportional to their demands, whereas the VCG auction-based approach minimizes the total OPEX for all resources utilized while extracting truthful demands from RUs. We consider time-wavelength division multiplexed (TWDM) passive optical network (PON)-based x-haul interfaces where PON virtualization technique is used to flexibly provide optical connections among RUs and edge-clouds at macro-cell RU locations as well as open-clouds at the central office locations. Moreover, we design efficient heuristics that yield significantly better economic efficiency and network resource utilization than conventional greedy resource allocation algorithms and reinforcement learning-based algorithms.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
translated by 谷歌翻译
Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.
translated by 谷歌翻译
The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-based optimization, and often leads to sub-optimal solutions. In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain. Furthermore, we show the advantage of our re-basin method by proposing a new cost function that allows performing incremental learning by exploiting the linear mode connectivity property. The benefit of our method is compared against similar approaches from the literature, under several conditions for both optimal transport finding and linear mode connectivity. The effectiveness of our continual learning method based on re-basin is also shown for several common benchmark datasets, providing experimental results that are competitive with state-of-art results from the literature.
translated by 谷歌翻译